{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Softmax"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction\n",
    "\n",
    "`Softmax` is a multi-dimension version of `sigmoid`. Softmax is used when:\n",
    "\n",
    "1. Used as a _softer_ max function, as it makes the max value more pronounced in its output.\n",
    "2. Approximating a probability distribution, because the output of softmax will never exceed $ 1 $ or get below $ 0 $."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Definition\n",
    "\n",
    "softmax($ x_i $) = $ \\frac{e^{x_i}}{\\sum_j e^{x_j}} $\n",
    "\n",
    "With temperature\n",
    "\n",
    "softmax($ x_i $, $ t $) = $ \\frac{e^{\\frac{x_i}{t}}}{\\sum_j e^{\\frac{x_j}{t}}} $"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How does softmax look, and how it works in code?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "\n",
    "import numpy as np\n",
    "from matplotlib import pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def softmax(x, t = 1):\n",
    "    exp = np.exp(x / t)\n",
    "\n",
    "    # sums over the last axis\n",
    "    sum_exp = exp.sum(-1, keepdims=True)\n",
    "    \n",
    "    return exp / sum_exp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's see how softmax approaches the max function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array = np.random.randn(5)\n",
    "softer_max = softmax(array)\n",
    "print(array)\n",
    "print(softer_max)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "See how the maximum value gets emphasized and gets a much larger share of probability. Applying weighted average would make it even clearer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "average = array.sum() / array.size\n",
    "weighted = array @ softer_max\n",
    "print(average)\n",
    "print(weighted)\n",
    "print(array.max())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "See how the weighted average gets closer to the real maximum. To make it even closer to max, reduce the temperature."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "colder_max = softmax(array, 0.1)\n",
    "weighted = array @ colder_max\n",
    "print(average)\n",
    "print(weighted)\n",
    "print(array.max())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Softmax is a generalization of sigmoid. Sigmoid can be seen as softmax($ [x, 0] $). Plotting shows that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = np.zeros([410, 2])\n",
    "x[:, 0] = np.arange(-200, 210) / 20\n",
    "y = softmax(x)\n",
    "plt.plot(x[:, 0], y[:, 0])\n",
    "plt.show()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
